-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GoogleCloudSpannerReceiver: Mask lock stats PII #14607
Conversation
Would using the redaction processor work for you? |
@mx-psi Hey! I don't think redaction processor will work for my use-case. Example: if value is "Table_name(key1, key2, key1...)" , i will modify it to be "Table_name(hash1, hash2, hash1...)" if the user wants to. As you can see, each key will correspond to a hashed value, and table_name will not be hashed. |
Right, then I think the transform processor should work if you use Either way, I think the status quo is to accept overlapping functionality if it makes it simpler to configure a pipeline, so no objection from my side to merging this after a review. |
|
||
func (mdp *MetricsDataPoint) HideLockStatsRowrangestartkeyPII() { | ||
for index, labelValue := range mdp.labelValues { | ||
if labelValue.Metadata().Name() == "row_range_start_key" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this label available on a single metric or on multiple metrics? (Trying to figure out how the configuration should look like)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This label is available on a single metric.
func parseAndHashRowrangestartkey(key string) string { | ||
startIndexKeys := strings.Index(key, "(") | ||
substring := key[startIndexKeys+1 : len(key)-1] | ||
hashedKey := key[:startIndexKeys+1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest using a strings.Builder here for performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done . Updated PR is at #16343 because this one got closed because of inactivity
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Description: Cloud Spanner receiver should have the capability to mask sensitive information. This PR adds a configurable option to mask the PII in lock stats metrics for customers.
For the metric "top minute lock stats", the label "row_range_start_key" (https://cloud.google.com/spanner/docs/introspection/lock-statistics#explain-row-range) has PII information of table key values. We have now given the ability to hash the keys individually such that
table_name(key1,key2) becomes table_name(hash1,hash2).
Note that even though key values are hashed, a user can identify if sets of keys share common prefix like table1(key1,key2) and table1(key1,key3)
Link to tracking Issue: Google-Internal customer-request.
Testing: Unit tests added. Manual end-to-end testing done by running the service locally and verifying the hashed row_range_start_key values.
Documentation: Added the configuration option and the corresponding info in the README